# Multimodal document understanding
Vintern 1B V2 ViTable Docvqa
MIT
A fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for Vietnamese document question answering (tabular data)
Text-to-Image
Transformers Other

V
YuukiAsuna
21
2
H2ovl Mississippi 2b
Apache-2.0
H2OVL-Mississippi-2B is a high-performance general-purpose vision-language model developed by H2O.ai, capable of handling a wide range of multimodal tasks. This model has 2 billion parameters and performs excellently in tasks such as image captioning, visual question answering (VQA), and document understanding.
Image-to-Text
Transformers English

H
h2oai
91.28k
34
Featured Recommended AI Models